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1. INTRODUCTION 

Concrete is a widely used material for construction in the world [1]. Different components such as 
cement, age, coarse and fine aggregate, and water should be combined to make the concrete [2], [3]. These 
components can affect the compressive strength of the concrete. To obtain the real compressive strength of 
concrete (target labels in the dataset), an engineer needs to break the cylinder samples under the compression- 
testing machine [4], [5]. The failure load is divided by the cylinder's cross-section to obtain the compressive 
strength. Engineers use different kinds of concretes for different building purposes. For example, the strength 
of concrete used for residential buildings should not be lower than 2500 pounds per square inch (psi) (17.2 
megapascals (MPa) [6]. Concrete is a material with high strength in compression, but low strength in tension. 
That is why engineers use reinforced concrete (usually with steel rebars) to build structures. 

The compressive strength of concrete is one of the significant parameters in structural engineering 
[7]. Because today's construction work requires high-strength concrete for higher durability. Determining the 
strength of concrete involves time, planning, and financial resources because the commonly used compressive 
strength factor is obtained on the 28th day [8]. For this reason, concrete strength is estimated before the concrete 
is used for building construction. The concrete strength is estimated by conducting laboratory tests. But, 
laboratory analysis of concrete sample strength requires significant experimentationtime and costs. 

Machine learning has been widely used for concrete strength prediction [9]. Moreover, there is a clear 
opportunity for an automated model to reduce the wait time for estimating concrete strength using traditional 
laboratory tests. Thus, with the concrete dataset acquired, it is possible to develop a model that learns about 
the relations between variables and develop a predictive model. However, there is no previous work that has 
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conducted extensive experiments on the better regression model that is recommended for concrete strength 
prediction. Thus, this research aims to evaluate the effectiveness of different regression models for concrete 
strength prediction. The objective of this work is to study the effectiveness of various supervised regression 
models on the prediction of concrete strength depending on its compositions. This study investigates the 
variability of different regression models in the prediction of concrete strength. (1) To develop different 
regression models for predicting the compressive strength of concrete, (2) To compare the models using 
visualizations and measures of accuracy, and (3) To analyze the importance of concrete contrastive features of 
each model. The rest of this work is organized as follows: in section 2, the state-of-the-art concretestrength 
prediction models are reviewed, and in section 3, the dataset and the regression models used in the simulation 
are presented. Section 4 presents the result found and finally section 5 concludes the findings of the work. 


2. RELATED WORK 

There have been significant efforts in the utilization of supervised machine learning algorithms to 
tackle construction problems. Supervised algorithms and prediction models have been widely employed in 
the estimation of concrete strength that is used in construction. For instance, in [10] backpropagation (BP) 
neural network (NN) is applied to the concrete compressive strength dataset to automate concrete strength 
analysis. Chen et al. [11] employed a bagging classifier and developed an automated concrete compressive strength 
prediction. The evaluation of the predictive accuracy of the developed model reveals that the artificial neural network 
outperforms compared decision tree and bagging classifier for concrete strength prediction. 

The performance of K-nearest neighbor (KNN), random forest, and decision tree algorithms are 
compared for concrete strength prediction [10]. The comparative result for concrete strength prediction using 
the (KNN), random forest, and decision tree shows that random forest outperforms compared to KNN and 
decision tree model. The random forest model performed with an accuracy of 91.26% on concrete strength 
prediction. Similarly, in [12], long short-term memory is applied to the concrete strength prediction problem. 
The authors employed a support vector regression algorithm to develop a model for strength prediction. As 
shown in the result analysis, the support vector regression model achieved a root mean square error (RMSE) 
of 0.508 and an R-Squared error of 0.997. Moreover, the authors compare the conventional support vector 
regression model with developed long short-term memory and the result shows that long short-term memory 
outperformed compared to the support vecotor regressor. 

Advanced machine learning techniques are applied to develop a model that predicts concrete strength 
[13]. The study compared different machine learning models such as decision tree, AdaBoost regressor, and 
bagging regressor. The result reveals that the bagging regressor or random forest regressor achieved higher 
accuracy as compared to the Adaboost and decision tree for concrete strength prediction. Another study [14] 
applied hybrid machine learning models to predict concrete strength. The researchers applied an artificial 
neural network (ANN) to develop the model that predicts concrete strength. The model is evaluated on a 
concrete test set and the result shows that the ANN model achieves 97% accuracy on concrete strength 
prediction. The performance of the support vector machine (SVM) and ANN is evaluated for concrete strength 
prediction [15]. The comparative result reveals that SVM outperforms as compared to the ANN model. 
Although different models have been developed for concrete strength prediction, the model in the literature has 
scope for improvement in terms of accuracy. 


3. METHOD 

The concrete strength prediction dataset is obtained from the University of California Irvine (UCI) 
machine learning data repository. The dataset consists of 1,030 samples of different concrete compositions and 
the concrete strength value. The UCI concrete compressive strength dataset is widely used to develop a machine 
learning model to predict concrete strength [16]-[18]. To develop the model, different regression algorithms 
are employed. The features of cement compressive are demonstrated in Table 1. 


3.1. Performance measures 

To evaluate the performance of different regression models employed to predict concrete compressive 
strength, the root means square error (RMSE) and R-squared are employed, and accuracy as a performance 
measure or metric. The root means the square error is determined by the following formula [19]-[22]. The 
RMSE is employed due to its wider applicability in regression model evaluation [23]-[25]. 


RMSE = V¥ (pi - 0) Ay (1) 
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where: È is the sum of the difference between the predicted and observed values for the i™ observation 
in thedataset, O; is the observed value for the i" observation in the dataset and N is the sample size. 


Table 1. The performance of regression models for concrete strength prediction. 


No. Model RMSE R Squared 
1 Linear regressor 0.628799 0.599578 
2 Ridge regressor 0.628678 0.599819 
3 K-Neighbors regressor 0.550102 0.694892 
4 Decision tree regressor 0.458048 0.786737 
5 Random forest regressor 0.340006 0.886869 
6 Gradient boosting regressor 0.330691 0.890700 
7 AdaBoost regressor 0.471189 0.775838 
8 Support vector regressor 0.428961 0.814650 


The regression models are trained on 7 input values of concrete strength composition parameters such 
as water, cement, coarse and fine aggregate, and other features. The effect of each concrete composition 
parameter on the concrete strength is examined. A correlation matrix between input parameters is calculated 
by employing the Person correlation coefficient for each pair of concrete composition parameters. The 
correlation matrix used to investigate the effect of different concrete composition parameters is demonstrated 
in Figure 1. The correlation matrix shows the relationship between each concrete compression strength feature. 

As shown in Figure 1, the concrete strength is highly affected by the cement parameter having a 
correlation value of 0.50. Thus, a strong positive correlation is found between composite strength and the 
amount of cement in the composition of the concrete. Moreover, there is a strong positive correlation between 
concrete composition namely, superplasticizer and water, and similarly a positive correlation between 
superplasticizer and fly ash. Moreover, the 3-dimensional plot of the three most important features, namely 
cement and superplasticizer against age and compressive strength is demonstrated in Figure 2. As demonstrated 
in Figure 2, cement and superplasticizer have a high impact on the concrete compressive strength. 
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Figure 1. Correlation matrix. 
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Figure 2. Three-dimensional plot of the most important features 
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4. RESULTS AND DISCUSSION 

The variability among different regression models such as linear regression, decision tree regression, 
random forest regression, support vector regression, K-neighbors regression, gradient, and AdaBoost 
regression is analyzed on the concrete compressive strength dataset. The results are visualized using tables and 
graphs using RMSE and R squared error as performance evaluation measures. Table 2 demonstrates the 
performance of the regression model on concrete strength prediction. In Table 2, the performance of different 
regression models is demonstrated. As shown in Table 2, the gradient boosting regressor model has better 
performance as compared to other regression models. Moreover, the performance of the regression models on 
correct strength prediction is shown in Figure 3. 


Table 2. The performance of regression models on concrete strength estimation 


No. Model Training score in (%) Test score in (%) 95% confidence interval 
1 Linear regression 93.2 86.3 -57<->41.47 
2 Lasso regression 74.2 67.9 -0.1<->0.92 
3 Adaboost regression 83.5 71.8 0.44<->0.68 
4 Random forest regression 83.2 74.4 0.30<->0.78 
5 Gradient boost regression 97.2 90.2 0.54<->1.00 
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Figure 3. RMSE and R Square comparison of regression model on concrete strength prediction. 


4.1. Performance of the regression models for concrete strength estimation 

The regression models are compared using training and test accuracy for concrete strength estimation. 
The comparative results for each regression model on concrete strength estimation are demonstrated in Table 
2. As demonstrated in Table 2, The Gradient boosting regression is the better model as it provides better 
complexity with a 95% confidence interval between 88% and 95%. The test score for each regression model 
on concrete strength estimation is demonstrated in Figure 4. 

As shown in Table 2, the gradient boosting regressor model outperforms compared to other regression 
models for concrete strength estimation. The confidence interval for gradient boosting regressor is between 
0.54 and 1.00. Moreover, the gradient boosting regressor model has an accuracy of 90.2% on test data with a 
confidence of 95%. Figure 4 demonstrates the accuracy of different regression models on concrete strength 
estimation. 


Accuracy of regression models with 95% confidence 


— Sees 


Accuracy in % 


Linear Lasso Adaboost Random forest Gradient boost 
regression regression regression regression regression 


Regression model 


Figure 4. Accuracy of regression models for concrete strength estimation with 95% confidence 
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5. CONCLUSION 

This study evaluated the performance of regression models for concrete strength estimation. 
Experimental simulation shows that the gradient boosting regressor gives a good accuracy score as compared 
to other regression models for concrete strength estimation. The experiment has also proved that with validation 
better result is obtained with gradient boosting regressor. Thus, a gradient boosting regressor is better for 
modeling the strength of high-performance concrete. The regression models are significant to improve the 
concrete strength prediction and reduce the number of experimental tests required when checking concrete 
composition. Finally, the study reveals that regression models are capable of mapping input features to 
target concrete compressive strength. 
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